Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpromotables skip replication and peer recovery #93210

Merged

Conversation

kingherc
Copy link
Contributor

For skipping replication:

  • ReplicationTracker and Group filter promotable
  • Remove from in sync allocations in metadata

Fixes ES-4861

For skipping peer recovery:

  • Shards pass directly to STARTED skipping intermediate peer recovery stages and messages

Fixes ES-5257

For skipping replication:
* ReplicationTracker and Group filter promotable
* Remove from in sync allocations in metadata

Fixes ES-4861

For skipping peer recovery:
* Shards pass directly to STARTED skipping
  intermediate peer recovery stages and messages

Fixes ES-5257
@kingherc kingherc added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v8.7.0 labels Jan 24, 2023
@kingherc kingherc self-assigned this Jan 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @kingherc, I've created a changelog YAML for you.

@kingherc
Copy link
Contributor Author

Oh BTW, the source branch should read feature/unpromotable-skip-recovery-replication but I'll leave it as is.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, pretty much how I hoped. I left some comments but nothing structural. I suspect this breaks the refresh API tho, ideally we'd address that here (or beforehand) too.

@kingherc
Copy link
Contributor Author

Did not handle yet the feedback, apart from the comment about Refresh. Just pushed a commit that re-enables Refresh for unpromotable shards. Feel free to review that if you'd like as well.

I will continue to handle the remaining feedback, and then invite you to review finally.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a handful of comments, mostly tiny stuff.

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Iraklis. Just left some minor comments/questions.

@pxsalehi
Copy link
Member

pxsalehi commented Jan 27, 2023

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

@kingherc
Copy link
Contributor Author

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

Oh thanks! Totally unaware of this. I do see indices:admin/refresh, and indices:admin/refresh[s] there but no mention of
[r]. I wonder if this is for client-driven actions rather than internally-generated ones (like the refresh [r] and [u] ones)?

@DaveCTurner might you know if I should add both [r] and [u] above?

@DaveCTurner
Copy link
Contributor

I believe that registering these actions is only necessary for actions that are exposed to a Client, which I think means those with a corresponding ActionType registered in ActionModule#setupActions. These subsidiary actions are not registered here, it doesn't make sense for a client to call them directly, so they don't need to be defined as operator-only or not.

@DaveCTurner
Copy link
Contributor

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

final Set<String> redundant = Sets.difference(labelledActions, allActions);
assertTrue(
"Actions may no longer be valid: ["
+ redundant
+ "]. They should be removed from either the operator-only action registry in ["
+ OperatorOnlyRegistry.class.getName()
+ "] or the non-operator action list in ["
+ Constants.class.getName()
+ "]",
redundant.isEmpty()

@pxsalehi
Copy link
Member

pxsalehi commented Jan 27, 2023

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

I'm not sure, I understand this. But I think, if the new internal action is not in NON_OPERATOR_ACTIONS, testEveryActionIsEitherOperatorOnlyOrNonOperator fails before it reaches the part you quoted.

@DaveCTurner
Copy link
Contributor

Probably simplest to see what happens if you try it:

./gradlew ':x-pack:plugin:security:qa:operator-privileges-tests:javaRestTest' --tests "org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT.testEveryActionIsEitherOperatorOnlyOrNonOperator"

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I handled your comments and this PR is now ready for review. Please feel free to review and also the previous unresolved conversations.

shardRoutings.addAll(routingTable.assignedShards()); // include relocation targets

// include relocation targets
shardRoutings.addAll(routingTable.assignedShards().stream().filter(ShardRouting::isPromotableToPrimary).toList());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just pull out the relocation targets in that loop instead of adding all the shards again.

I now realize that iterating the shards above already contains the relocation targets, right? I am not even sure why the old code added again the assigned shards for relocation targets.

So we do not need this line at all in the end, I will simply remove it. Please tell me if I'm assuming wrong here.

@kingherc kingherc marked this pull request as ready for review January 27, 2023 17:48
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, there's just one more comment (an easy trap to fall into) and a few more nits.

@kingherc
Copy link
Contributor Author

Hi all, I handled your remaining feedback. Feel free to check latest commits. It'd be great if you could review today if possible! Thanks

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kingherc kingherc merged commit cb966ef into elastic:main Jan 31, 2023
@kingherc kingherc deleted the feature/promotable-skip-recovery-replication branch January 31, 2023 09:31
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Jan 31, 2023
For skipping replication:
* ReplicationTracker and Group filter shards that are promotable to primary
* Remove unpromotable shards from in sync allocations in metadata
* There is a new Refresh action for unpromotable replica shards

Fixes ES-4861

For skipping peer recovery:
* Unpromotable shards pass directly to STARTED skipping some intermediate peer recovery stages and messages

Fixes ES-5257
.stream()
.filter(Predicate.not(ShardRouting::isPromotableToPrimary))
.map(ShardRouting::currentNodeId)
.collect(Collectors.toUnmodifiableSet())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this intermediate collection? I think it could be replaced with distinct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we dropped this. We don't even need distinct(), we're looking at node IDs for the copies of a single shard which are necessarily distinct anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dropped it from another place, but not here. I can handle this in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Meta label for distributed team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants